{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 模型构造\n",
    "\n",
    "让我们回顾一下在[“多层感知机的简洁实现”](../chapter_deep-learning-basics/mlp-gluon.ipynb)一节中含单隐藏层的多层感知机的实现方法。我们首先构造`Sequential`实例，然后依次添加两个全连接层。其中第一层的输出大小为256，即隐藏层单元个数是256；第二层的输出大小为10，即输出层单元个数是10。我们在上一章的其他\n",
    "节中也使用了`Sequential`类构造模型。这里我们介绍另外一种基于`Module`类的模型构造方法：它让模型构造更加灵活。\n",
    "\n",
    "\n",
    "## 继承`Module`类来构造模型\n",
    "\n",
    "`Module`类是`nn`模块里提供的一个模型构造类，我们可以继承它来定义我们想要的模型。下面继承`Module`类构造本节开头提到的多层感知机。这里定义的`MLP`类重载了`Module`类的`__init__`函数和`forward`函数。它们分别用于创建模型参数和定义前向计算。前向计算也即正向传播。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "\n",
    "class MLP(nn.Module):\n",
    "    # 声明带有模型参数的层，这里声明了两个全连接层\n",
    "    def __init__(self, **kwargs):\n",
    "        # 调用MLP父类Module的构造函数来进行必要的初始化。这样在构造实例时还可以指定其他函数\n",
    "        # 参数，如“模型参数的访问、初始化和共享“一节将介绍的模型参数params\n",
    "        super(MLP, self).__init__(**kwargs)\n",
    "        self.hidden = nn.Linear(20, 256) # 隐藏层\n",
    "        self.activation = nn.ReLU()\n",
    "        self.output = nn.Linear(256, 10) # 输出层\n",
    "        \n",
    "    # 定义模型的前向计算，即如何根据输入x计算返回所需要的模型输出\n",
    "    def forward(self, x):\n",
    "        x = self.hidden(x)\n",
    "        x = self.activation(x)\n",
    "        x = self.output(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "以上的`MLP`类中无须定义反向传播函数。系统将通过自动求梯度而自动生成反向传播所需的`backward`函数。\n",
    "\n",
    "我们可以实例化`MLP`类得到模型变量`net`。下面的代码初始化`net`并传入输入数据`X`做一次前向计算。其中，`net(X)`会调用`MLP`继承自`Module`类的`__call__`函数，这个函数将调用`MLP`类定义的`forward`函数来完成前向计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[-0.0406,  0.1010,  0.0255, -0.0043,  0.0051,  0.0108, -0.3359,  0.0073,\n",
       "         -0.0664, -0.0695],\n",
       "        [-0.0901,  0.1338,  0.0629,  0.0217,  0.2059,  0.0741, -0.2959, -0.1462,\n",
       "          0.0120, -0.0442]], grad_fn=<AddmmBackward>)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = torch.rand(2, 20)\n",
    "net = MLP()\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "注意，这里并没有将`Module`类命名为`Layer`（层）或者`Model`（模型）之类的名字，这是因为该类是一个可供自由组建的部件。它的子类既可以是一个层（如nn提供的`Linear`类），又可以是一个模型（如这里定义的`MLP`类），或者是模型的一个部分。我们下面通过两个例子来展示它的灵活性。\n",
    "\n",
    "## `Sequential`类继承自`Module`类\n",
    "\n",
    "我们刚刚提到，`Module`类是一个通用的部件。事实上，`Sequential`类继承自`Module`类。当模型的前向计算为简单串联各个层的计算时，可以通过更加简单的方式定义模型。这正是`Sequential`类的目的：它提供`add_module`函数来逐一添加串联的`Module`子类实例，而模型的前向计算就是将这些实例按添加的顺序逐一计算。\n",
    "\n",
    "下面我们实现一个与`Sequential`类有相同功能的`MySequential`类。这或许可以帮助读者更加清晰地理解`Sequential`类的工作机制。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MySequential(nn.Module):\n",
    "    def __init__(self, ** kwargs):\n",
    "        super(MySequential, self).__init__(**kwargs)\n",
    "        \n",
    "    def add_module(self, name, module):\n",
    "        # module是一个Module子类实例，将设它有独一无二的名字。我们将它保存在Module类的\n",
    "        # 成员变量__modules里，其类型是OrderedDict。\n",
    "        self._modules[name] = module\n",
    "    \n",
    "    def forward(self, x):\n",
    "        # OrderDict保证会按照成员添加时的顺序遍历成员\n",
    "        for module in self._modules.values():\n",
    "            x = module(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们用`MySequential`类来实现前面描述的`MLP`类，并使用随机初始化的模型做一次前向计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.1882, -0.1073,  0.0855, -0.2296,  0.3447,  0.1496, -0.0214,  0.3606,\n",
       "         -0.2185, -0.0730],\n",
       "        [ 0.2162,  0.0489,  0.1943, -0.2400,  0.3910,  0.2237, -0.0612,  0.2907,\n",
       "         -0.1091, -0.2108]], grad_fn=<AddmmBackward>)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = MySequential()\n",
    "net.add_module(\"hidden\", nn.Linear(20, 256))\n",
    "net.add_module(\"activation\", nn.ReLU())\n",
    "net.add_module(\"output\", nn.Linear(256, 10))\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**注：MySequential的结果与MLP的结果不同是因为Linear层的参数是随机初始化的。**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以观察到这里`MySequential`类的使用跟[“多层感知机的简洁实现”](../chapter_deep-learning-basics/mlp-gluon.ipynb)一节中`Sequential`类的使用没什么区别。\n",
    "\n",
    "\n",
    "## 构造复杂的模型\n",
    "\n",
    "虽然`Sequential`类可以使模型构造更加简单，且不需要定义`forward`函数，但直接继承`Module`类可以极大地拓展模型构造的灵活性。下面我们构造一个稍微复杂点的网络`FancyMLP`。在这个网络中，我们通过`get_constant`函数创建训练中不被迭代的参数，即常数参数。在前向计算中，除了使用创建的常数参数外，我们还使用`torch`的函数和Python的控制流，并多次调用相同的层。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class FancyMLP(nn.Module):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(FancyMLP, self).__init__(**kwargs)\n",
    "        self.rand_weight = nn.Parameter(torch.rand(20, 20), requires_grad=False)\n",
    "        self.linear = nn.Linear(20, 20)\n",
    "        self.activation = nn.ReLU()\n",
    "    \n",
    "    def forward(self, x):\n",
    "        x = self.linear(x)\n",
    "        x = self.activation(x)\n",
    "        # 使用创建的常数参数，以及nn的relu函数和mm函数\n",
    "        x = torch.relu(x.mm(self.rand_weight.data) + 1)\n",
    "        # 复用全连接层。等价于两个全连接层共享参数\n",
    "        x = self.linear(x)\n",
    "        x = self.activation(x)\n",
    "        # 控制流，这里我们需要调用item函数来返回标量进行比较\n",
    "        while x.norm().item() > 1:\n",
    "            x /= 2\n",
    "        if x.norm().item() < 0.8:\n",
    "            x *= 10\n",
    "        return x.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这个`FancyMLP`模型中，我们使用了常数权重`rand_weight`（注意它不是模型参数）、做了矩阵乘法操作（`torch.dot`）并重复使用了相同的`Linear`层。下面我们来测试该模型的随机初始化和前向计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(26.8951, grad_fn=<SumBackward0>)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "net = FancyMLP()\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为`FancyMLP`和`Sequential`类都是`Module`类的子类，所以我们可以嵌套调用它们。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(3.0947, grad_fn=<SumBackward0>)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "class NestMLP(nn.Module):\n",
    "    def __init__(self, **kwargs):\n",
    "        super(NestMLP, self).__init__(**kwargs)\n",
    "        self.net = nn.Sequential(\n",
    "            nn.Linear(20, 64),\n",
    "            nn.ReLU(),\n",
    "            nn.Linear(64, 32),\n",
    "            nn.ReLU()\n",
    "        )\n",
    "        \n",
    "        self.linear3 = nn.Linear(32, 16)\n",
    "        self.activation3 = nn.ReLU()\n",
    "    \n",
    "    def forward(self, x):\n",
    "        return self.activation3(self.linear3(self.net(x)))\n",
    "\n",
    "net = nn.Sequential(NestMLP(), nn.Linear(16, 20), FancyMLP())\n",
    "\n",
    "net(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "* 可以通过继承`Module`类来构造模型。\n",
    "* `Sequential`类继承自`Module`类。\n",
    "* 虽然`Sequential`类可以使模型构造更加简单，但直接继承`Module`类可以极大地拓展模型构造的灵活性。\n",
    "\n",
    "\n",
    "## 练习\n",
    "\n",
    "* 如果不在`MLP`类的`__init__`函数里调用父类的`__init__`函数，会出现什么样的错误信息？\n",
    "* 如果去掉`FancyMLP`类里面的`item`函数，会有什么问题？\n",
    "* 如果将`NestMLP`类中通过`Sequential`实例定义的`self.net`改为`self.net = [nn.Linear(20, 64), nn.ReLU(), nn.Linear(64, 32),nn.ReLU()]`，会有什么问题？\n",
    "\n",
    "\n",
    "## 扫码直达[讨论区](https://discuss.gluon.ai/t/topic/986)\n",
    "\n",
    "![](../img/qr_model-construction.svg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:pytorch]",
   "language": "python",
   "name": "conda-env-pytorch-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
